Vision Language Models (VLMs), GenAI for CV

Vision language models are models that can learn simultaneously from images and texts to tackle many tasks, from visual question answering to image captioning

Resources

Leaderboards

Models

Applications

References